137 research outputs found
Mining Contrasting Quasi-Clique Patterns
Mining dense quasi-cliques is a well-known clustering task with applications
ranging from social networks over collaboration graphs to document analysis.
Recent work has extended this task to multiple graphs; i.e. the goal is to find
groups of vertices highly dense among multiple graphs. In this paper, we argue
that in a multi-graph scenario the sparsity is valuable for knowledge
extraction as well. We introduce the concept of contrasting quasi-clique
patterns: a collection of vertices highly dense in one graph but highly sparse
(i.e. less connected) in a second graph. Thus, these patterns specifically
highlight the difference/contrast between the considered graphs. Based on our
novel model, we propose an algorithm that enables fast computation of
contrasting patterns by exploiting intelligent traversal and pruning
techniques. We showcase the potential of contrasting patterns on a variety of
synthetic and real-world datasets.Comment: 10 page
Adversarial Attacks on Graph Neural Networks via Meta Learning
Deep learning models for graphs have advanced the state of the art on many
tasks. Despite their recent success, little is known about their robustness. We
investigate training time attacks on graph neural networks for node
classification that perturb the discrete graph structure. Our core principle is
to use meta-gradients to solve the bilevel problem underlying training-time
attacks, essentially treating the graph as a hyperparameter to optimize. Our
experiments show that small graph perturbations consistently lead to a strong
decrease in performance for graph convolutional networks, and even transfer to
unsupervised embeddings. Remarkably, the perturbations created by our algorithm
can misguide the graph neural networks such that they perform worse than a
simple baseline that ignores all relational information. Our attacks do not
assume any knowledge about or access to the target classifiers.Comment: ICLR submissio
Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking
Methods that learn representations of nodes in a graph play a critical role
in network analysis since they enable many downstream learning tasks. We
propose Graph2Gauss - an approach that can efficiently learn versatile node
embeddings on large scale (attributed) graphs that show strong performance on
tasks such as link prediction and node classification. Unlike most approaches
that represent nodes as point vectors in a low-dimensional continuous space, we
embed each node as a Gaussian distribution, allowing us to capture uncertainty
about the representation. Furthermore, we propose an unsupervised method that
handles inductive learning scenarios and is applicable to different types of
graphs: plain/attributed, directed/undirected. By leveraging both the network
structure and the associated node attributes, we are able to generalize to
unseen nodes without additional training. To learn the embeddings we adopt a
personalized ranking formulation w.r.t. the node distances that exploits the
natural ordering of the nodes imposed by the network structure. Experiments on
real world networks demonstrate the high performance of our approach,
outperforming state-of-the-art network embedding methods on several different
tasks. Additionally, we demonstrate the benefits of modeling uncertainty - by
analyzing it we can estimate neighborhood diversity and detect the intrinsic
latent dimensionality of a graph.Comment: Updated: ICLR 2018 camera-ready versio
Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift
We might hope that when faced with unexpected inputs, well-designed software
systems would fire off warnings. Machine learning (ML) systems, however, which
depend strongly on properties of their inputs (e.g. the i.i.d. assumption),
tend to fail silently. This paper explores the problem of building ML systems
that fail loudly, investigating methods for detecting dataset shift,
identifying exemplars that most typify the shift, and quantifying shift
malignancy. We focus on several datasets and various perturbations to both
covariates and label distributions with varying magnitudes and fractions of
data affected. Interestingly, we show that across the dataset shifts that we
explore, a two-sample-testing-based approach, using pre-trained classifiers for
dimensionality reduction, performs best. Moreover, we demonstrate that
domain-discriminating approaches tend to be helpful for characterizing shifts
qualitatively and determining if they are harmful.Comment: Advances in Neural Information Processing Systems (NeurIPS) 201
Adversarial Attacks on Neural Networks for Graph Data
Deep learning models for graphs have achieved strong performance for the task
of node classification. Despite their proliferation, currently there is no
study of their robustness to adversarial attacks. Yet, in domains where they
are likely to be used, e.g. the web, adversaries are common. Can deep learning
models for graphs be easily fooled? In this work, we introduce the first study
of adversarial attacks on attributed graphs, specifically focusing on models
exploiting ideas of graph convolutions. In addition to attacks at test time, we
tackle the more challenging class of poisoning/causative attacks, which focus
on the training phase of a machine learning model. We generate adversarial
perturbations targeting the node's features and the graph structure, thus,
taking the dependencies between instances in account. Moreover, we ensure that
the perturbations remain unnoticeable by preserving important data
characteristics. To cope with the underlying discrete domain we propose an
efficient algorithm Nettack exploiting incremental computations. Our
experimental study shows that accuracy of node classification significantly
drops even when performing only few perturbations. Even more, our attacks are
transferable: the learned attacks generalize to other state-of-the-art node
classification models and unsupervised approaches, and likewise are successful
even when only limited knowledge about the graph is given.Comment: Accepted as a full paper at KDD 2018 on May 6, 201
Linearized and Single-Pass Belief Propagation
How can we tell when accounts are fake or real in a social network? And how
can we tell which accounts belong to liberal, conservative or centrist users?
Often, we can answer such questions and label nodes in a network based on the
labels of their neighbors and appropriate assumptions of homophily ("birds of a
feather flock together") or heterophily ("opposites attract"). One of the most
widely used methods for this kind of inference is Belief Propagation (BP) which
iteratively propagates the information from a few nodes with explicit labels
throughout a network until convergence. One main problem with BP, however, is
that there are no known exact guarantees of convergence in graphs with loops.
This paper introduces Linearized Belief Propagation (LinBP), a linearization
of BP that allows a closed-form solution via intuitive matrix equations and,
thus, comes with convergence guarantees. It handles homophily, heterophily, and
more general cases that arise in multi-class settings. Plus, it allows a
compact implementation in SQL. The paper also introduces Single-pass Belief
Propagation (SBP), a "localized" version of LinBP that propagates information
across every edge at most once and for which the final class assignments depend
only on the nearest labeled neighbors. In addition, SBP allows fast incremental
updates in dynamic networks. Our runtime experiments show that LinBP and SBP
are orders of magnitude faster than standardComment: 17 pages, 11 figures, 4 algorithms. Includes following major changes
since v1: renaming of "turbo BP" to "single-pass BP", convergence criteria
now give sufficient *and* necessary conditions, more detailed experiments,
more detailed comparison with prior BP convergence results, overall improved
expositio
Certifiable Robustness to Graph Perturbations
Despite the exploding interest in graph neural networks there has been little
effort to verify and improve their robustness. This is even more alarming given
recent findings showing that they are extremely vulnerable to adversarial
attacks on both the graph structure and the node attributes. We propose the
first method for verifying certifiable (non-)robustness to graph perturbations
for a general class of models that includes graph neural networks and
label/feature propagation. By exploiting connections to PageRank and Markov
decision processes our certificates can be efficiently (and under many threat
models exactly) computed. Furthermore, we investigate robust training
procedures that increase the number of certifiably robust nodes while
maintaining or improving the clean predictive accuracy.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
KDD-SC: Subspace Clustering Extensions for Knowledge Discovery Frameworks
Analyzing high dimensional data is a challenging task. For these data it is
known that traditional clustering algorithms fail to detect meaningful
patterns. As a solution, subspace clustering techniques have been introduced.
They analyze arbitrary subspace projections of the data to detect clustering
structures.
In this paper, we present our subspace clustering extension for KDD
frameworks, termed KDD-SC. In contrast to existing subspace clustering
toolkits, our solution neither is a standalone product nor is it tightly
coupled to a specific KDD framework. Our extension is realized by a common
codebase and easy-to-use plugins for three of the most popular KDD frameworks,
namely KNIME, RapidMiner, and WEKA. KDD-SC extends these frameworks such that
they offer a wide range of different subspace clustering functionalities. It
provides a multitude of algorithms, data generators, evaluation measures, and
visualization techniques specifically designed for subspace clustering. These
functionalities integrate seamlessly with the frameworks' existing features
such that they can be flexibly combined. KDD-SC is publicly available on our
website.Comment: 8 pages, 8 figure
Pitfalls of Graph Neural Network Evaluation
Semi-supervised node classification in graphs is a fundamental problem in
graph mining, and the recently proposed graph neural networks (GNNs) have
achieved unparalleled results on this task. Due to their massive success, GNNs
have attracted a lot of attention, and many novel architectures have been put
forward. In this paper we show that existing evaluation strategies for GNN
models have serious shortcomings. We show that using the same
train/validation/test splits of the same datasets, as well as making
significant changes to the training procedure (e.g. early stopping criteria)
precludes a fair comparison of different architectures. We perform a thorough
empirical evaluation of four prominent GNN models and show that considering
different splits of the data leads to dramatically different rankings of
models. Even more importantly, our findings suggest that simpler GNN
architectures are able to outperform the more sophisticated ones if the
hyperparameters and the training procedure are tuned fairly for all models
Adversarial Attacks on Node Embeddings via Graph Poisoning
The goal of network representation learning is to learn low-dimensional node
embeddings that capture the graph structure and are useful for solving
downstream tasks. However, despite the proliferation of such methods, there is
currently no study of their robustness to adversarial attacks. We provide the
first adversarial vulnerability analysis on the widely used family of methods
based on random walks. We derive efficient adversarial perturbations that
poison the network structure and have a negative effect on both the quality of
the embeddings and the downstream tasks. We further show that our attacks are
transferable since they generalize to many models and are successful even when
the attacker is restricted.Comment: ICML 2019, PMLR 97:695-70
- …